Information processing method and terminal device

ABSTRACT

Disclosed are an information processing method and a terminal device. The method comprises: acquiring first information, wherein the first information is information to be processed by a terminal device; calling an operation instruction in a calculation apparatus to calculate the first information so as to obtain second information; and outputting the second information. By means of the examples in the present disclosure, a calculation apparatus of a terminal device can be used to call an operation instruction to process first information, so as to output second information of a target desired by a user, thereby improving the information processing efficiency. The present technical solution has advantages of a fast computation speed and high efficiency.

TECHNICAL FIELD

The present disclosure relates to the technical field of informationtechnology, and particularly to an information processing method andrelated products.

BACKGROUND

With the growing information technology and people's ever-increasingdemand, the need for timeliness of information becomes stronger. Atpresent, a terminal obtains and processes information based on ageneral-purpose processor, such as implementing a super-resolution imagein a general-purpose processor, which means improving the resolution ofan image.

However, in practical applications, this method of obtaining informationbased on a general-purpose processor may be limited by the operationspeed of the general-purpose processor. In particular, when the load ofa general-purpose processor is large, the method may lead to lowefficiency and high latency of information processing.

SUMMARY

Examples of the present disclosure provide an information computationmethod and related products, which can increase processing speed andefficiency of a computation device.

In a first aspect, an example of the present disclosure provides aninformation processing method which is applied to a computation device,where the computation device includes a communication unit and anoperation unit. The method includes:

controlling, by the computation device, the communication unit to obtaina first image to be processed, where the first image has a resolution ofa first-level size;

controlling, by the computation device, the operation unit to obtain andcall an operation instruction to perform resolution optimization on thefirst image to obtain a second image, where

the second image has a resolution of a second-level size, thefirst-level size is smaller than the second-level size, and theoperation instruction is a preset instruction for optimizing an imageresolution.

In some possible examples, the controlling, by the computation device,the communication unit to obtain a first image to be processed includes:

controlling, by the computation device, the communication unit to obtainan original image to be processed input by a user, where the originalimage has a resolution of the first-level size, and

controlling, by the computation device, the operation unit topre-process the original image to obtain the first image to beprocessed, where the pre-processing is an operation preset by a userside or a terminal side.

In some possible examples, the computation device further includes aregister unit and a controller unit, and the controlling, by thecomputation device, the operation unit to obtain and call an operationinstruction to perform resolution optimization on the first image, so asto obtain the second image includes:

controlling, by the computation device, the controller unit to fetch anoperation instruction from the register unit, and sending, by thecomputation device, the operation instruction to the operation unit;

controlling, by the computation device, the controller unit to call theoperation instruction to perform feature extraction on the first imageto obtain a feature image, and

controlling, by the computation device, the operation unit topre-process the feature image to obtain the second image, where thepre-processing is an operation preset by a user side or a terminal side.

In some possible examples, the pre-processing includes one or more ofthe following processing manners: translation, scaling transformation,non-linear transformation, normalization, format conversion, datadeduplication, processing of data exception, and data missing filling.

In some possible examples, the calling the operation instruction toperform feature extraction on the first image to obtain a feature imageincludes:

controlling, by the computation device, the operation unit to performfeature extraction on the first image based on an operation instructionset of at least one thread to obtain a feature image, where theoperation instruction set includes at least one operation instruction,and an order of calling the operation instruction in the operationinstruction set is customized by a user side or a terminal side.

In some possible examples, the computation device further includes adata access unit and a storage medium,

the computation device controls the operation unit to send the secondimage to the data access unit and store the second image in the storagemedium.

In some possible examples, the operation unit includes a primaryoperation module and a plurality of secondary operation modules, wherethe primary operation module is interconnected with the plurality ofsecondary operation modules by an interconnection module, and when theoperation instruction is a convolution operation instruction,

the calling the operation instruction to perform resolution optimizationon the first image includes:

controlling, by the computation device, the secondary operation modulesto implement a convolution operation of input data and a convolutionkernel in a convolutional neural network algorithm, where the input datais the first image and the convolutional neural network algorithmcorresponds to the convolution operation instruction,

controlling, by the computation device, the interconnection module toimplement data transfer between the primary operation module and thesecondary operation modules; before a forward operation of a neuralnetwork fully connected layer starts, transferring, by the primaryoperation module, the input data to each secondary operation modulethrough the interconnection module; and after the computation of thesecondary operation modules is completed, splicing, by theinterconnection module, output scalars of the respective secondaryoperation modules stage by stage to obtain an intermediate vector, andsending the intermediate vector back to the primary operation module;and

controlling, by the computation device, the primary operation module tosplice intermediate vectors corresponding to all input data into anintermediate result, and performing subsequent operations on theintermediate result.

In some possible examples, the performing subsequent operations on theintermediate result includes:

controlling, by the computation device, the primary operation module toadd bias data to the intermediate result, and then performing anactivation operation.

In some possible examples, the primary operation module includes a firstoperation unit, where the first operation unit includes a vectoraddition unit and an activation unit,

the step of controlling, by the computation device, the primaryoperation module to add bias data to the intermediate result, and thenperforming an activation operation include:

controlling, by the computation device, the vector addition unit toimplement a bias addition operation of a convolutional neural networkoperation and perform element-wise addition on bias data and theintermediate result to obtain a bias result; and

controlling, by the computation device, the activation unit to performan activation function operation on the bias result.

In some possible examples, the primary operation module includes a firststorage unit, a first operation unit, a first data dependencydetermination unit, and a first storage unit; and the above methodincludes:

controlling, by the computation device, the first storage unit to cacheinput data and output data used by the primary operation module during acomputation process, where the output data includes the second image;

controlling, by the computation device, the first operation unit toperform various operational functions of the primary operation module;

controlling, by the computation device, the data dependencydetermination unit to ensure that there is no consistency conflict inreading data from and writing data to the first storage unit, read aninput neuron vector from the first storage unit, and send the vector tothe secondary operation modules through the interconnection module; andsends an intermediate result vector from the interconnection module tothe first operation unit.

In some possible examples, each secondary operation module includes asecond operation unit, where the second operation unit includes a vectormultiplication unit and an accumulation unit,

the controlling, by the computation device, the secondary operationmodules to perform a convolution operation of input data and aconvolution kernel in a convolutional neural network algorithm includes:

controlling, by the computation device, the vector multiplication unitto perform a vector multiplication operation of the convolutionoperation, and

controlling, by the computation device, the accumulation unit to performan accumulation operation of the convolution operation.

In some possible examples, each secondary operation module includes asecond operation unit, a second data dependency determination unit, asecond storage unit, and a third storage unit; and the method includes:

controlling, by the computation device, the second operation unit toperform various arithmetic and logical operations of the secondaryoperation module,

controlling, by the computation device, the second data dependencydetermination unit to perform a reading/writing operation on the secondstorage unit and the third storage unit during a computation process andensure that there is no consistency conflict between the reading andwriting operations on the second storage unit and the third storageunit,

controlling, by the computation device, the second storage unit to cacheinput data and an output scalar obtained from the computation performedby the secondary operation module, and

controlling, by the computation device, the third storage unit to cachea convolution kernel required by the secondary operation module during acomputation process.

In some possible examples, the first data dependency or the second datadependency ensures that there is no consistency conflict in reading andwriting in the following manners: storage addresses corresponding todata/instructions stored in the corresponding storage unit do notoverlap; or determining whether there is dependency between a controlsignal that has not been executed and data of a control signal that isbeing executed, if there is no dependency, the control signal is allowedto be issued immediately, otherwise, the control signal is not allowedto be issued until all control signals on which the control signal isdependent have been executed, where

the computation device controls the controller unit to obtain anoperation instruction from the register unit and decode the operationinstruction into the control signal for controlling behavior of othermodules, where the other modules include the primary operation moduleand the plurality of secondary operation modules.

In some possible examples, the computation device controls the pluralityof secondary operation modules to compute respective output scalars inparallel by using the same input data and respective convolutionkernels.

In some possible examples, an activation function active used by theprimary operation module may be any of the following non-linearfunctions: sigmoid, tanh, relu, softmax, or may be a linear function.

In some possible examples, the interconnection module forms a datachannel for continuous or discrete data between the primary operationmodule and the plurality of secondary operation modules. Theinterconnection module has any of the following structures: a treestructure, a ring structure, a grid structure, a hierarchicalinterconnection, and a bus structure.

In a second aspect, an example of the present disclosure provides acomputation device which includes a function unit configured to performthe method of the first aspect.

In a third aspect, an example of the present disclosure provides acomputer readable storage medium on which a computer program used forelectronic data exchange is stored, where the computer program enables acomputer to perform the method of the first aspect.

In a fourth aspect, an example of the present disclosure furtherprovides a computer program product which includes a non-transitorycomputer readable storage medium on which a computer program is stored.The computer program may cause a computer to perform the method of thefirst aspect.

In a fifth aspect, an example of the present disclosure provides a chipwhich includes the computation device of the second aspect.

In a sixth aspect, an example of the present disclosure provides a chippackage structure which includes the chip of the fifth aspect.

In a seventh aspect, an example of the present disclosure provides aboard card which includes the chip package structure of the sixthaspect.

In an eighth aspect, an example of the present disclosure provides anelectronic device which includes the board card of the seventh aspect.

In some examples, the electronic device includes a data processingdevice, a robot, a computer, a printer, a scanner, a tablet, a smartterminal, a mobile phone, a traffic recorder, a navigator, a sensor, awebcam, a server, a cloud-based server, a camera, a video camera, aprojector, a watch, a headphone, a mobile storage, a wearable device, avehicle, a household appliance, and/or a medical equipment.

In some examples, the vehicle includes an airplane, a ship, and/or acar. The household electrical appliance includes a television, an airconditioner, a microwave oven, a refrigerator, a rice cooker, ahumidifier, a washing machine, an electric lamp, a gas cooker, and arange hood. The medical equipment includes a nuclear magnetic resonancespectrometer, a B-ultrasonic scanner, and/or an electrocardiograph.

Technical effects of implementing the examples of the present disclosureare as follows:

It can be seen that through the examples of the present disclosure, thecomputation device may control a communication unit to obtain a firstimage to be processed, where the first image has a resolution of afirst-level size; and then the computation device may control anoperation unit to call an operation instruction to perform resolutionoptimization to obtain a second image, where the second image has aresolution of a second-level size, the first-level size is smaller thanthe second-level size, and the operation instruction is a presetinstruction for optimizing the resolution of an image; in this way, theresolution of an image can be improved and increased, compared with theprior art using a general-purpose processor to improve resolution, thepresent disclosure has technical effects of lower power consumption andfaster speed.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to illustrate the technical solutions in the examples of thepresent disclosure more clearly, the drawings to be used in thedescription of the examples are briefly explained below. Obviously, thedrawings in the description below are some examples of the presentdisclosure. Other drawings can be obtained according to the discloseddrawings without any creative effort by those skilled in the art.

FIG. 1A is a structural diagram of a computation device according to anexample of the present disclosure.

FIG. 1B is a schematic flowchart of a convolutional neural networkalgorithm.

FIG. 1C is a schematic diagram of an instruction of a device supportinga convolutional neural network forward operation according to an exampleof the present disclosure.

FIG. 1D is a block diagram of an overall structure of a device forperforming a convolutional neural network forward operation according toan example of the present disclosure.

FIG. 1E is a structural diagram of an H-tree module (an implementationof an interconnection module) of a device for performing a convolutionalneural network forward operation according to an example of the presentdisclosure.

FIG. 1F is a block diagram of a structure of a primary operation moduleof a device for performing a convolutional neural network forwardoperation according to an example of the present disclosure.

FIG. 1G is a block diagram of a structure of a secondary operationmodule of a device for performing a convolutional neural network forwardoperation according to an example of the present disclosure.

FIG. 1H is a block diagram of a process of a single-layer convolutionalneural network forward operation according to an example of the presentdisclosure.

FIG. 2 is a flowchart of an information processing method according toan example of the present disclosure.

FIG. 3 is a schematic diagram of calling an operation instruction basedon single-thread according to an example of the present disclosure.

FIG. 4 is a schematic diagram of calling an operation instruction basedon multiple threads according to an example of the present disclosure.

FIG. 5 is a structural diagram of another computation device accordingto an example of the present disclosure.

DETAILED DESCRIPTION OF THE EXAMPLES

Technical solutions in examples of the present disclosure will bedescribed clearly and completely hereinafter with reference to theaccompanied drawings in the examples of the present disclosure.Obviously, the examples to be described are merely some rather than allexamples of the present disclosure. All other examples obtained by thoseof ordinary skill in the art based on the examples of the presentdisclosure without creative efforts shall fall within the protectionscope of the present disclosure.

Terms such as “first”, “second”, “third”, and “fourth” in thespecification, the claims, and the drawings are used for distinguishingdifferent objects rather than describing a specific order. In addition,terms such as “include”, “have”, and any variant thereof are used forindicating non-exclusive inclusion. For instance, a process, a method, asystem, a product, or an equipment including a series of steps or unitsis not limited to the listed steps or units, but optionally includessteps or units that are not listed, or optionally includes other stepsor units inherent to the process, the method, the product, or theequipment.

Reference to “example” means that a particular feature, a structure, ora characteristic described in conjunction with the example may beincluded in at least one example of the present disclosure. The termused in various places in the specification does not necessarily referto the same example, nor does it refer to an example that is mutuallyexclusive, independent, or alternative to other examples. It can beexplicitly and implicitly understood by those skilled in the art thatthe examples described herein may be combined with other examples.

First, a computation device used in the present disclosure isintroduced. FIG. 1A provides a computation device, where the deviceincludes a storage medium 611 (optional), a register unit 612, aninterconnection module 613, an operation unit 614, a controller unit615, and a data access unit 616, where

the operation unit 614 include at least two of the following: anaddition arithmetic unit, a multiplication arithmetic unit, acomparator, and an activation arithmetic unit.

The interconnection module 613 is configured to control a connectionrelationship of the arithmetic units in the operation unit 614 so thatthe at least two arithmetic units form a different computation topology.

The instruction storage unit (which may be a register unit, aninstruction cache, or a scratchpad memory) 612 is configured to storethe operation instruction, an address of a data block in the storagemedium, and a computation topology corresponding to the operationinstruction.

The operation instruction may include an operation field and an opcode.Taking a convolution operation instruction as an example, as shown inTable 1, register 0, register 1, register 2, register 3, and register 4may be operation fields. Each of the register 0, register 1, register 2,register 3, and register 4 may be one or a plurality of registers.

Opcode Register 0 Register 1 Register 2 Register 3 Register 4 COMPUTEInput data Input data Convolution Convolution Address of an startinglength kernel kernel activation address starting length function addressinterpolation table IO Address of Data Address of an an external lengthinternal memory of memory of data data NOP JUMP Target address MOVEInput address Data size Output address

The storage medium 611 may be an off-chip memory, and in certainapplications, may also be an on-chip memory for storing a data block.The data block may be n-dimensional data, where n is an integer greaterthan or equal to 1. For instance, when n=1, the data is one-dimensionaldata, which is a vector; when n=2, the data is two-dimensional data,which is a matrix; and when n is equal to or greater than 3, the data ismulti-dimensional data.

The control unit 615 is configured to fetch an operation instruction, anoperation field corresponding to the operation instruction, and a firstcomputation topology corresponding to the operation instruction from theregister unit 612, and decode the operation instruction into anexecution instruction. The execution instruction is configured tocontrol the operation unit to perform an operation, transfer theoperation field to the data access unit 616, and transfer thecomputation topology to the interconnection module 613.

The data access unit 616 is configured to fetch a data blockcorresponding to the operation field from the storage medium 611 andtransfer the data block to the interconnection module 613.

The interconnection module 613 is configured to receive the firstcomputation topology and the data block. In an example, theinterconnection module 613 is further configured to rearrange the datablock according to the first computation topology.

The operation unit 614 is configured to call an arithmetic unit of theoperation unit 614 according to the execution instruction to perform anoperation on the data block to obtain an operation result, transfer theoperation result to the data access unit, and store the result in thestorage medium. In an example, the operation unit 614 is configured tocall an arithmetic unit according to the first computation topology andthe execution instruction to perform an operation on the rearranged datablock to obtain an operation result, transfer the operation result tothe data access unit, and store the result in the storage medium.

In another example, the interconnection module 613 is configured to formthe first computation topology according to the connection relationshipsof the arithmetic units in the operation unit 614.

An interconnection module is set in the computation device provided bythe present disclosure. The interconnecting module can connect thearithmetic units in the computation unit to obtain a computationtopology corresponding to the computation instruction according to theneeds of the computation instruction, so that there is no need to storeor fetch intermediate data of the computation in subsequent operationsof the operation unit. Through this structure, a single instruction canimplement a single input and perform operations of a plurality ofarithmetic units to obtain a computation result, which improves thecomputation efficiency.

A computation method of the computation device shown in FIG. 1A isexplained below based on different operation instructions. As aninstance, the operation instruction may be a convolution operationinstruction. The convolution operation instruction can be applied to aneural network, so the convolution operation instruction may also becalled a convolutional neural network operation instruction. A formulato be perform by the convolution operation instruction may be:s=s(Σwx_(i)+b), which is to multiply a convolution kernel W by inputdata x_(i), find the sum, add a bias b, and then perform an activationoperation s(h) to obtain a final output result S. According to theformula, the computation topology may be obtained, which is: themultiplication arithmetic unit—the addition arithmetic unit—the(optional) activation arithmetic unit.

A method of performing a convolution operation instruction by thecomputation device shown in FIG. 1A may include:

fetching, by the control unit 615, a convolution operation instruction,an operation field corresponding to the convolution operationinstruction, and the first computation topology (the multiplicationarithmetic unit—the addition arithmetic unit—the addition arithmeticunit—the activation arithmetic unit) corresponding to the convolutionoperation instruction from the register unit 612; transferring, by thecontrol unit, the operation field to a data access unit, andtransferring the first computation topology to the interconnectionmodule;

fetching, by the data access unit, a convolution kernel w and a bias b(if b is 0, there is no need to fetch the bias b) corresponding to theoperation field from the storage medium, and transferring theconvolution kernel w and the bias b to the operation unit; and

multiplying, by the multiplication arithmetic unit of the computationunit, a convolution kernel w and input data Xi to obtain a first result,inputting the first result to the addition arithmetic unit to performaddition to obtain a second result, adding the second result and a biasb to obtain a third result, inputting the third result to the activationarithmetic unit to perform an activation operation to obtain an outputresult S, transferring the output result S to the data access unit, andstoring, by the data access unit, the output result in the storagemedium. After each step, the result may be transferred to the dataaccess and stored in storage medium without performing a following step.The step of adding the second result and the bias b to obtain the thirdresult is optional, which means this step is not required when b is 0.

In addition, the order of addition and multiplication can be reversed.

The technical solution provided by the present disclosure can realizeconvolution operations according to one instruction which is aconvolution operation instruction. There is no need to store or obtainintermediate data of convolution operations (such as a first result, asecond result, and a third result). The technical solution may reducethe storing and obtaining operations of intermediate data, and may havetechnical effects of reducing a corresponding operation step andimproving outcomes of convolution operations.

It should be understood that the instruction set used in the presentdisclosure may include one or a plurality of operation instructions. Theoperation instruction includes, but is not limited to a COMPUTEinstruction (an operation instruction), a CONFIG instruction, an IOinstruction, an NOP instruction, a JUMP instruction, a MOVE instruction,etc. The COMPUTE instruction includes, but is not limited to, aconvolution (CONV) instruction, a pooling operation instruction, etc.Specifically, an executable computation instruction in the presentdisclosure includes:

a convolution operation instruction. In an example, the convolutionCOMPUTE instruction (the CONV instruction) includes:

a convolutional neural network sigmoid instruction: according to theinstruction, a device fetches input data and a convolution kernel of aspecified size from a specified address in a memory (optionally ascratchpad memory or a scalar register file), performs a convolutionoperation in a convolution operation component, and optionally, performssigmoid activation on an output result;

a convolutional neural network TanH instruction: according to theinstruction, the device fetches input data and a convolution kernel of aspecified size from a specified address in a memory (optionally ascratchpad memory) respectively, performs a convolution operation in theconvolution operation component, and then performs TanH activation on anoutput result;

a convolutional neural network ReLU instruction: according to theinstruction, the device fetches input data and a convolution kernel of aspecified size from a specified address in the memory (optionally ascratchpad memory) respectively, performs a convolution operation in aconvolution operation component, and then performs ReLU activation on anoutput result; and

a convolutional neural network group instruction: according to theinstruction, the device fetches input data and a convolution kernel of aspecified size from a specified address in the memory (optionally ascratchpad memory) respectively, partitions the input data and theconvolution kernel into groups, performs a convolution operation in aconvolution operation component, and then performs activation on anoutput result.

A convolution operation instruction (pure convolution operationinstruction): according to the instruction, the device fetches inputdata and a convolution kernel of a specified size from a specifiedaddress in the memory (optionally a scratchpad memory) respectively, andperforms a convolution operation in a convolution operation component.The above-mentioned specified size may be set by the user ormanufacturer. For instance, in a computation device of a firstmanufacturer, the specified size may be set to data of A bit, and in acomputation device of a second manufacturer, the specified size may beset to data of B bit. The data of A bit and the data of B bit havedifferent sizes.

The pooling instruction. In an example, the pooling COMPUTE instruction(the pooling operation instruction, which is also referred to as thepooling instruction in the present disclosure) specifically includes:

a Maxpooling forward operation instruction: according to theinstruction, the device fetches input data of a specified size from aspecified address in a memory (optionally a scratchpad memory or ascalar register file), performs a Maxpooling forward operation in apooling operation component, and writes a result back to a specifiedaddress in the memory (optionally a scratchpad memory or a scalarregister file);

a Maxpooling backward training instruction: according to theinstruction, the device fetches input data of a specified size from aspecified address in a memory (optionally a scratchpad memory or ascalar register file), performs Maxpooling backward training in apooling operation component, and writes a result back to a specifiedaddress in the memory (optionally a scratchpad memory or a scalarregister file);

an Avgpooling forward operation instruction: according to theinstruction, the device fetches input data of a specified size from aspecified address in a memory (optionally a scratchpad memory or ascalar register file), performs an Avgpooling forward operation in apooling operation component, and writes a result back to a specifiedaddress in the memory (optionally a scratchpad memory or a scalarregister file);

an Avgpooling backward training instruction: according to theinstruction, the device fetches input data of a specified size from aspecified address in a memory (optionally a scratchpad memory or ascalar register file), performs Avgpooling backward training in apooling operation component, and writes a result back to a specifiedaddress in the memory (optionally a scratchpad memory or a scalarregister file);

a Minpooling forward operation instruction: according to theinstruction, the device fetches input data of a specified size from aspecified address in a memory (optionally a scratchpad memory or ascalar register file), performs a Minpooling forward operation in apooling operation component, and writes a result back to a specifiedaddress in the memory (optionally a scratchpad memory or a scalarregister file); and

a Minpooling backward training instruction: according to theinstruction, the device fetches input data of a specified size from aspecified address in a memory (optionally a scratchpad memory or ascalar register file), performs Minpooling backward training in apooling operation component, and writes a result back to a specifiedaddress in the memory (optionally a scratchpad memory or a scalarregister file).

A batch normalization instruction can be used for a batch normalizationcomputation.

A fully connected instruction may include a fully connected layerforward operation instruction.

A fully connected layer forward operation instruction: according to theinstruction, a device fetches weight data and bias data from a specifiedaddress in a memory, performs a full connection operation in acomputation unit, and writes a result back to a specified address in ascratchpad memory.

The CONFIG instruction configures various constants required by acomputation of a current artificial neural network layer before thecomputation starts. For instance, 1/kernel_area can be obtained byconfiguration using the CONFIG instruction. In the batch normalizationcomputation, the CONFIG instruction configures various constantsrequired for a current layer before a batch normalization computationbegins.

The IO instruction is for reading-in input data required for acomputation from an external storage space, and storing data to theexternal space after the computation finishes.

The NOP instruction is for emptying control signals in all controlsignal cache queues in the current device, and ensuring that allinstructions before the NOP instruction are finished. The NOPinstruction itself does not include any operations.

The JUMP instruction is for controlling jumping of a next instructionaddress to be read from an instruction storage unit, so that the jumpingof a control flow can be realized.

The MOVE instruction is for moving data of an address in an internaladdress space of the device to another address in the internal addressspace of the device. This process is independent of an operation unitand does not occupy resources of the operation unit during execution.

Optionally, operation instructions that can be executed by thecomputation device may further include:

a Matrix Mult Vector (MMV) instruction: according to the instruction,the device fetches matrix data and vector data of a set length from aspecified address in a scratchpad memory, performs amatrix-multiply-vector operation in the operation unit, and writes acomputation result back to a specified address in the scratchpad memory;it is worth noting that a vector can be stored in the scratchpad memoryas a matrix of a special form (a matrix with only one row of elements);

a Vector Mult Matrix (VMM) instruction: according to the instruction,the device fetches vector data and matrix data of a set length from aspecified address in a scratchpad memory, performs avector-multiply-matrix operation in the operation unit, and writes acomputation result back to a specified address in the scratchpad memory;it is worth noting that a vector can be stored in the scratchpad memoryas a matrix of a special form (a matrix with only one row of elements);

a Matrix Mult Scalar (VMS) instruction: according from instruction, thedevice fetches matrix data of a set length from a specified address in ascratchpad memory, fetches matrix data of a specified size from aspecified address of a scalar register file, and performs ascalar-multiply-matrix operation in the operation unit, and writes acomputation result back to a specified address in the scratchpad memory;it is worth noting that the scalar register file stores not only anaddress of the matrix but also scalar data;

a Tensor Operation (TENS) instruction: according to the instruction, thedevice fetches two pieces of matrix data of a set length from twospecified addresses in a scratchpad memory, performs a tensor operationon the two pieces of matrix data in the operation unit, and writes aresult back to a specified address of the scratchpad memory;

a Matrix Add Matrix (MA) instruction: according to the instruction, thedevice fetches two pieces of matrix data of a set length from twospecified addresses in a scratchpad memory, adds the two pieces ofmatrix data in the operation unit, and writes a computation result backto a specified address in the scratchpad memory;

a Matrix Sub Matrix (MS) instruction: according to the instruction, thedevice fetches two pieces of matrix data of a set length from twospecified addresses in a scratchpad memory, performs a subtractionoperation on the two pieces of matrix data in the operation unit, andwrites a computation result back to a specified address in thescratchpad memory;

a Matrix Retrieval (MR) instruction: according to the instruction, thedevice fetches vector data of a set length from a specified address in ascratchpad memory, fetches matrix data of a specified size from aspecified address in the scratchpad memory; in the operation unit, thevector is an index vector, and an i^(th) element of an output vector isa number obtained from an i^(th) column of the matrix by using an i^(th)element of the index vector as an index; and the output vector iswritten back to a specified address in the scratchpad memory;

a Matrix Load (ML) instruction: according to the instruction, the deviceloads data of a set length from a specified external source address to aspecified address in a scratchpad memory;

a Matrix Store (MS) instruction: according to the instruction, thedevice stores matrix data of a set length from a specified address in ascratchpad memory to an external target address;

a Matrix Move (MMOVE) instruction: according to the instruction, thedevice moves matrix data of a set length from a specified address in ascratchpad memory to another specified address in the scratchpad memory;

a Vector-Inner-Product instruction (VP): according to the instruction,the device fetches vector data of a specified size from a specifiedaddress in a memory (optionally a scratchpad memory or a scalar registerfile), performs an inner product (a scalar) on two vectors in a vectorcomputation unit, and writes the result back; optionally, the result iswritten back to a specified address in the memory (optionally ascratchpad memory or a scalar register file);

a vector cross product instruction (TENS): according to the instruction,the device fetches vector data of a specified size from a specifiedaddress in a memory (optionally a scratchpad memory or a scalar registerfile), performs an inner product (a scalar) on two vectors in a vectorcomputation unit, and writes the result back; optionally, the result iswritten back to a specified address in the memory (optionally ascratchpad memory or a scalar register file);

a vector elementary arithmetic operation including a Vector-Add-Scalarinstruction (VAS): according to the instruction, the device fetchesvector data of a specified size from a specified address in a memory(optionally a scratchpad memory or a scalar register file), fetchesscalar data from a specified address of a scalar register file of thememory, adds the scalar to each element of the vector in a scalarcomputation unit, and writes the result back; optionally, the result iswritten back to a specified address in the memory (optionally ascratchpad memory or a scalar register file);

a Scalar-Sub-Vector instruction (SSV): according to the instruction, thedevice fetches scalar data from a specified address in the scalarregister in a memory (optionally a scratchpad memory or a scalarregister file), fetches vector data from a specified address in thememory (optionally the scratchpad memory or the scalar register file),subtracts corresponding elements of the vector from the scalar in avector computation unit, and writes the result back; optionally, theresult is written back to a specified address in the memory (optionallya scratchpad memory or a scalar register file);

a Vector-Dev-Vector instruction (VD): according to the instruction, thedevice fetches vector data of a specified size from a specified addressin a memory (optionally a scratchpad memory or a scalar register file),performs an element-wise division of two vectors in a vector computationunit, and writes the result back; optionally, the result is written backto a specified address in the memory (optionally a scratchpad memory ora scalar register file);

a Scalar-Dev-Vector instruction (SDV): according to the instruction, thedevice fetches scalar data from a specified address in the scalarregister file of a memory (optionally a scratchpad memory or a scalarregister file), fetches vector data of a specified size from a specifiedaddress in the memory (optionally the scratchpad memory), divides thescalar by corresponding elements in the vector in a vector computationunit, and writes the result back; optionally, the result is written backto a specified position in the memory (optionally a scratchpad memory ora scalar register file).

The computation device can also execute a vector logic instruction,including:

a Vector-AND-Vector instruction (VAV): according to the instruction, thedevice fetches vector data of a specified size from a specified addressin a memory (optionally a scratchpad memory or a scalar register file)respectively, performs an element-wise AND on two vectors in a vectorcomputation unit, and writes the result back; optionally, the result iswritten back to a specified address in the memory (optionally ascratchpad memory or a scalar register file);

a Vector-AND instruction (VAND): according to the instruction, thedevice fetches vector data of a specified size from a specified addressin a memory (optionally a scratchpad memory or a scalar register file),performs an element-wise AND operation on two vectors in a vectorcomputation unit, and writes the result back; optionally, the result iswritten back to a specified address in the scalar register file of thememory (optionally a scratchpad memory or a scalar register file);

a Vector-OR-Vector instruction (VOV): according to the instruction, thedevice fetches vector data of a specified size from a specified addressin a memory (optionally a scratchpad memory) respectively, performs anelement-wise OR operation on two vectors in a vector computation unit,and writes the result back; optionally, the result is written back to aspecified address in the memory (optionally a scratchpad memory or ascalar register file);

a Vector-OR instruction (VOR): according to the instruction, the devicefetches vector data of a specified size from a specified address in amemory (optionally a scratchpad memory or a scalar register file),performs an OR operation on each element of the vector in a vectorcomputation unit, and writes the result back; optionally, the result iswritten back to a specified address in the scalar register file of thememory (optionally a scratchpad memory or a scalar register file);

a transcendental function instruction: according to the instruction, thedevice fetches vector data of a specified size from a specified addressin a memory (optionally a scratchpad memory or a scalar register file),performs a transcendental function operation on the vector data in anoperation unit, and writes the result back; optionally, the result iswritten back to a specified address in a storage unit of the memory(optionally a scratchpad memory or a scalar register file). optionally,the result is written back specified address in the memory (optionally ascratchpad memory or a scalar register file);

The computation device can also execute a vector comparison operationinstruction, including:

a Greater-Equal operation instruction (GE): according to theinstruction, the device may obtain parameters of the instruction,including a length of a vector, a starting address of two vectors, and astorage address of an output vector, directly from the instruction or byaccessing the serial number of the register of a memory (optionally ascratchpad memory or a scalar register file) provided by theinstruction, then read data of the two vectors, and compare the elementsat all positions in the vectors in a vector comparison operation unit;at the position of a row, if the value of a previous vector is greaterthan or equal to the value of a subsequent vector, the value of thecomparison result vector at that position is set to 1, otherwise it isset to 0; finally, the comparison result is written back to a specifiedstorage address in the memory (optionally the scratchpad memory or thescalar register file);

a Less-Equal operation instruction (LE): according to the instruction,the device may obtain the parameters of the instruction, including thelength of a vector, the starting address of the two vectors, and thestorage address of the output vector, directly from the instruction orby accessing the serial number of the register of a memory (optionally ascratchpad memory or a scalar register file) provided by theinstruction, then read the data of the two vectors, and compare theelements at all positions in the vectors in a vector comparisonoperation unit; at the position of a row, if the value of a previousvector is less than or equal to the value of a subsequent vector, thevalue of the comparison result vector at that position is set to 1,otherwise it is set to 0; finally, the comparison result is written backto a specified storage address in the memory (optionally the scratchpadmemory or the scalar register file);

a Greater-Than operation instruction (GT): according to the instruction,the device may obtain the parameters of the instruction, including thelength of a vector, the starting address of the two vectors, and thestorage address of the output vector, directly from the instruction orby accessing the serial number of the register of a memory (optionally ascratchpad memory or a scalar register file) provided by theinstruction, then read the data of the two vectors, and compare theelements at all positions in the vectors in a vector comparisonoperation unit; at the position of a row, if the value of a previousvector is greater than the value of a subsequent vector, the value ofthe comparison result vector at that position is set to 1, otherwise itis set to 0; finally, the comparison result is written back to aspecified storage address in the memory (optionally the scratchpadmemory or the scalar register file);

a Less-Than operation instruction (LT): according to the instruction,the device may obtain the parameters of the instruction, including thelength of a vector, the starting address of the two vectors, and thestorage address of the output vector, directly from the instruction orby accessing the serial number of the register of a memory (optionally ascratchpad memory or a scalar register file) provided by theinstruction, then read the data of the two vectors, and compare theelements at all positions in the vectors in a vector comparisonoperation unit; at the position of a row, if the value of a previousvector is less than the value of a subsequent vector, the value of thecomparison result vector at that position is set to 1, otherwise it isset to 0; finally, the comparison result is written back to a specifiedstorage address in the memory (optionally the scratchpad memory or thescalar register file);

an Equal operation instruction (EQ): according to the instruction, thedevice may obtain the parameters of the instruction, including thelength of a vector, the starting address of the two vectors, and thestorage address of the output vector, directly from the instruction orby accessing the serial number of the register of a memory (optionally ascratchpad memory or a scalar register file) provided by theinstruction, then read the data of the two vectors, and compare theelements at all positions in the vectors in a vector comparisonoperation unit; at the position of a row, if the value of a previousvector is equal to the value of a subsequent vector, the value of thecomparison result vector at that position is set to 1, otherwise it isset to 0; finally, the comparison result is written back to a specifiedstorage address in the memory (optionally the scratchpad memory or thescalar register file);

an Unequal operation instruction (UEQ): according to the instruction,the device may obtain the parameters of the instruction, including thelength of a vector, the starting address of the two vectors, and thestorage address of the output vector, directly from the instruction orby accessing the serial number of the register of a memory (optionally ascratchpad memory or a scalar register file) provided by theinstruction, then read the data of the two vectors, and compare theelements at all positions in the vectors in a vector comparisonoperation unit; at the position of a row, if the value of a previousvector is unequal to the value of a subsequent vector, the value of thecomparison result vector at that position is set to 1, otherwise it isset to 0; finally, the comparison result is written back to a specifiedstorage address in the memory (optionally the scratchpad memory or thescalar register file);

a Vector Max instruction (VMAX): according to the instruction, thedevice fetches vector data of a specified size from a specified addressin a scratchpad memory of a memory (optionally a scratchpad memory or ascalar register file), selects a largest element from the vector data asa result, and writes the result back; optionally, the result is writtenback to a specified address in the scalar register file of the memory(optionally a scratchpad memory or a scalar register file);

a Vector Min instruction (VMIN): according to the instruction, thedevice fetches vector data of a specified size from a specified addressin a memory (optionally a scratchpad memory or a scalar register file),selects a minimum element from the vector data as a result, and writesthe result back; optionally, the result is written back to a specifiedaddress in the scalar register file of the memory (optionally ascratchpad memory or a scalar register file);

a Cyclic Shift operation instruction: according to the instruction, thedevice may obtain parameters of the instruction directly from theinstruction or by accessing the serial number of the register of amemory (optionally a scratchpad memory or a scalar register file)provided by the instruction, then cyclically shift vectors in a vectorshift unit (which may be a separate vector shift unit or a computationunit), and then write the result of the shift back to a specifiedstorage address in the memory (optionally the scratchpad memory or thescalar register file), where a format of the cyclic shift operationinstruction format may include four operation fields, a starting addressand length of a vector, a shift stride, and a storage address of anoutput vector; and

a Random-Vector generation instruction: according to the instruction,the device reads one or more randomly distributed parameters, and thesize and storage address of a random vector to be generated from theinstruction or from the register of a memory (optionally a scratchpadmemory or a scalar register file), generates the random vector that isin line with the random distribution in a random vector generation unit,and then writes the result of the random vector back to the specifiedstorage address in the memory (optionally the scratchpad memory or thescalar register file).

The Random-Vector generation instruction may be:

a Uniform distribution instruction (UNIF): according to the instruction,the device reads uniformly distributed upper and lower bound parameters,and the size and storage address of the random vector to be generatedfrom the instruction or from the register file of a memory (optionally ascratchpad memory or a scalar register file), generates the randomvector that is in line with the uniform distribution in a random vectorgeneration unit, and then writes the result of the random vector back tothe specified storage address in the memory (optionally the scratchpadmemory or the scalar register file); and

a Gaussian distribution instruction (GAUS): according to theinstruction, the device reads Gaussian distributed mean and varianceparameters, and the size and storage address of the random vector to begenerated from the instruction or from the register of a memory(optionally a scratchpad memory or a scalar register file), generatesthe random vector that is in line with the Gaussian distribution in arandom vector generation unit, and then writes the result of the randomvector back to the specified storage address in the memory (optionallythe scratchpad memory or the scalar register file).

When the computation device shown in FIG. 1A is used to execute aconvolutional neural network algorithm (a convolution operationinstruction), please refer to the flowchart of the convolutional neuralnetwork algorithm shown in FIG. 1B. As shown in FIG. 1B, a convolutionalneural network includes output data, an activation function, an inputdata layer, and a convolution kernel.

Each computation process includes: selecting corresponding input datax^(i) in the input data layer according to a convolution window, andthen performing an addition operation on the input data and theconvolution kernel. A computation process of the output data iss=s(Σwx_(i)+b), which is to multiply a convolution kernel w by inputdata x^(i), find the sum, add a bias b, and then perform an activationoperation s(h) to obtain a final output data s. The multiplication ofthe convolution kernel and the input data is vector multiplication.

According to the size k_(x) of the convolution kernel on an X axis andthe size k_(y) of the convolution kernel on the Y axis, the convolutionwindow firstly selects input data of which the size is the same as thatof the convolution kernel from the input data of which the size of the Xaxis is W and the size of the Y axis is H, performs horizontaltranslation and then vertical translation according to translationposition vectors S_(x) and S_(y) of the convolution window, andtraverses all the input data.

FIG. 1C shows a format of an instruction set according to an example ofthe present disclosure. As shown in the figure, a convolutional neuralnetwork operation instruction includes at least one opcode and at leastone operation field. The opcode is for indicating a function of theconvolutional neural network operation instruction. A convolutionalneural network operation unit can perform a convolutional neural networkoperation by identifying the opcode. The operation field is forindicating data information of the convolutional neural networkoperation instruction. The data information may be an immediate operandor a register number (which, optionally, may be a register file), whichincludes a starting address and a length of input data, a startingaddress and a length of the convolution kernel, and a type of anactivation funciton.

The instruction set includes: convolutional neural network COMPUTEinstruction with different functions, a CONFIG instruction, an IOinstruction, an NOP instruction, a JUMP instruction, and a MOVEinstruction. The above operation instructions will not be furtherdescribed herein. For details, please refer to related descriptions inthe above examples.

Optionally, the instruction set may further include a convolutionactivation CONV_ACTIVATE instruction.

The convolution activation CONV_ACTIVATE instruction: according to theinstruction, the device fetches input data and a convolution kernel of aspecified size from a specified address in the scratchpad memory(optionally), performs a convolution operation in a convolutionoperation component, and then performs an activation function operationon an output result; the above-mentioned specified size may be set bythe manufacturer or user.

In one example, the CONV_ACTIVATE instruction includes: a convolutionoperation instruction and an activation instruction. The activationinstruction is configured to perform an activation function operation,and the convolution operation instruction is configured to perform aconvolution operation. For details, please refer to related descriptionsin the above examples.

FIG. 1D is a schematic structural diagram of a device for performing aconvolutional neural network forward operation according to an exampleof the present disclosure. As shown in FIG. 3, the device includes aninstruction storage unit 1, a controller unit 2, a data access unit 3,an interconnection module 4, a primary operation module 5, and aplurality of secondary operation modules 6. The instruction storage unit1, the controller unit 2, the data access unit 3, the interconnectionmodule 4, the primary operation module 5, and the plurality of secondaryoperation modules 6 may all be realized in a form of a hardware circuit(for instance, including but not limited to FPGA, CGRA, ASIC, analogcircuit, memristor, etc.).

The instruction storage unit 1 is configured to read an instructionthrough the data access unit 3 and store the instruction.

The controller unit 2 is configured to read an instruction from theinstruction storage unit 1, decode the instruction into a control signalfor controlling the behavior of other modules, and send the controlsignal to other modules such as the data access unit 3, the primaryoperation module 5, and the plurality of secondary operation modules 6.

The data access unit 3 can access an external address space, directlyread and write data to each storage unit inside the device to completethe loading and storage of the data,

The interconnection module 4 is configured to connect the primaryoperation module and the secondary operation modules, and can beimplemented into different interconnection topologies (such as treestructure, ring structure, grid structure, hierarchical interconnection,bus structure, etc.).

FIG. 1E schematically shows an implementation of the interconnectionmodule 4: an H-tree module. The interconnection module 4 forms a datapath between the primary operation module 5 and the plurality ofsecondary operation modules 6, where the data path is a binary tree pathcomposed of a plurality of nodes. Each node can transfer data receivedfrom an upstream node to two downstream nodes, and merge data returnedby the two downstream nodes and return to an upstream node. Forinstance, at the beginning of a computational phase of a convolutionalneural network, neuron data in the primary operation module 5 is sent toeach secondary operation module 6 through the interconnection module 4;when the secondary operation modules 6 finish computing, neuron valuesoutput by the respective secondary operation modules are splicedstage-by-stage into a complete vector composed of neurons in theinterconnection module. For instance, if there are N secondary operationmodules in the device, input data x_(i) is transferred to the Nsecondary operation modules and each of the secondary operation modulesperforms a convolution operation on the input data x_(i) and theconvolution kernel corresponding to the secondary operation module toobtain a piece of scalar data. The scalar data of the respectivesecondary operation module are merged into an intermediate vectorincluding N elements by the interconnection module 4. If the convolutionwindow obtains a total of A*B pieces of (A pieces in the X direction, Bpieces in the Y direction, where X and Y are coordinate axes of thethree-dimensional orthogonal coordinate system) input data xi bytraverse, a convolution operation is perform on the above A*B pieces ofx_(i) and all the vectors obtained are merged in the primary operationmodule to obtain a three-dimensional intermediate result of A*B*N.

FIG. 1F is a block diagram of a structure of the primary operationmodule 5 of a device for performing a convolutional neural networkforward operation according to an example of the present disclosure. Asshown in FIG. 1F, the primary operation module 5 includes a firstoperation unit 51, a first data dependency determination unit 52, and afirst storage unit 53.

The first operation unit 51 includes a vector addition unit 511 and anactivation unit 512. The first operation unit 51 is configured toreceive a control signal from the controller unit and complete variousoperational functions of the primary operation module 5. The vectoraddition unit 511 is configured to perform an operation of adding a biasin the forward computation of the convolutional neural network, andperform element-wise addition on bias data and the intermediate resultsto obtain a bias result. The activation operation unit 512 performs anactivation function operation on the bias result. The bias data may beread in from an external address space, or may be stored locally.

The data dependency determination unit 52 is a port for the firstoperation unit 51 to read/write the first storage unit 53, so as toensure consistency in reading data from and writing data to the firststorage unit 53. At the same time, the first data dependencydetermination unit 52 is also configured to send data read from thefirst storage unit 53 to the secondary operation modules through theinterconnection module 4. Output data of the secondary operation modules6 is directly sent to the first operation unit 51 through theinterconnection module 4. An instruction output by the controller unit 2is sent to the operation unit 51 and the first data dependencydetermination unit 52 to control their behavior.

The storage unit 53 is configured to cache input data and output dataused by the primary operation module 5 during a computation process.

FIG. 1G is a block diagram of a structure of the secondary operationmodules 6 of a device for performing a convolutional neural networkforward operation according to an example of the present disclosure. Asshown in FIG. 1E, each secondary operation module 6 includes a secondoperation unit 61, a data dependency determination unit 62, a secondstorage unit 63, and a third storage unit 64.

The second operation unit 61 is configured to receive a control signalfrom the controller unit 2 and perform a convolution operation. Thesecond operation unit includes a vector multiplication unit 611 and anaccumulation unit 612, which are respectively responsible for a vectormultiplication operation and an accumulation operation in a convolutionoperation.

The second data dependency determination unit 62 is responsible forreading and writing the second storage unit 63 during a computationprocess. Before performing read/write operations, the second datadependency determination unit 62 first ensures that there is noconsistency conflict between the reading and writing of data used byinstructions. For instance, all control signals sent to the datadependency unit 62 are stored in the instruction queue inside the datadependency unit 62. In this queue, if a range of data to be read by areading instruction conflicts with a range of data to be written by awriting instruction that is located at the front of the queue, theinstruction can only be executed until a writing instruction depended bythe instruction has been executed.

The second storage unit 63 is configured to cache input data and outputscalar data of the secondary operation modules 6.

The third storage unit 64 is configured to cache convolution kernel datarequired by the secondary operation modules 6 in a computation process.

FIG. 1H is a flowchart of executing a convolutional neural network by aconvolutional neural network operation device according to an example ofthe present disclosure. As shown in FIG. 1H, a process of executing theconvolutional neural network neural network instruction includes:

a step S1, pre-storing an IO instruction in a starting address of theinstruction storage unit 1;

a step S2, the operation starts, reading, by the controller unit 2, theIO instruction from the starting address of the instruction storage unit1, and according to a control signal decoded from the instruction,reading, by the data access unit 3, all corresponding convolutionalneural network operation instructions from an external address space,and caching the instructions in the instruction storage unit 1;

a step S3, reading, by the controller unit 2, a next IO instruction fromthe instruction storage unit, and according to a control signal obtainedby decoding, reading, by the data access unit 3, all data (such as inputdata, an interpolation table for a quick activation function operation,a constant table for configuring parameters of the operation device,bias data, etc.) required by the primary operation module 5 from theexternal address space to the first storage unit 53 of the primaryoperation module 5; and a step S4, reading, by the controller unit 2, anext IO instruction from the instruction storage unit, and according toa control signal decoded from the instruction, reading, by the dataaccess unit 3, convolution kernel data required by the secondaryoperation modules 6 from the external address space;

a step S5, reading, by the controller unit 2, a next CONFIG instructionfrom the instruction storage unit, and according to a control signalobtained by decoding, configuring, by the device, various constantsrequired by the computation of the neural network layer; for instance,the first operation unit 51 and the second operation unit 61 mayconfigure a value of an internal register of the parameter configurationunit in the control signal, where the parameter includes, for instance,data required by an activation function;

a step S6, reading, by the controller unit 2, a next COMPUTE instructionfrom the instruction storage unit, and according to a control signaldecoded from the instruction, sending, by the primary operation module5, input data in a convolution window to each secondary operation module6 through an interconnection module 4 and saving the input data to thesecond storage unit 63 of the secondary operation module 6; and thenmoving the convolution window according to the instruction;

a step S7, according to the control signal decoded from the COMPUTEinstruction, reading, by the operation unit 61 of the secondaryoperation module 6, the convolution kernel from the third storage unit64; reading the input data from the second storage unit 63 to completethe convolution operation of the input data and the convolution kernel;and returning an obtained intermediate result through theinterconnection module 4;

a step S8, in the interconnection module 4, splicing intermediateresults returned from respective secondary operation modules 6 stage bystage to obtain a complete intermediate vector;

a step S9, obtaining, by the primary operation module 5, theintermediate vector returned by the interconnection module 4;traversing, by the convolution window, all input data; splicing, by theprimary operation module, all returned vectors into an intermediateresult; according to the control signal decoded from the COMPUTEinstruction, reading bias data from the first storage unit 53, addingthe intermediate result and the bias data in a vector addition unit 511to obtain a bias result; activating the bias result by the activationunit 512, and writing final output data back to the first storage unit;and

a step S10, reading, by the controller unit 2, a next IO instructionfrom the instruction storage unit, and according to a control signaldecoded from the instruction, storing, by the data access unit 3, theoutput data in the first storage unit 53 to a specified address in theexternal address space, then the operation finishes.

The implementation of a multi-layer convolutional neural network issimilar to that of a single-layer convolutional neural network. After anupper layer of the convolutional neural network is executed, anoperation instruction of a next layer uses an output data address of theupper layer stored in the primary operation unit as an input dataaddress of this layer. Similarly, the address of a convolution kerneland the address of bias data in the instruction may also be changed toan address corresponding to this layer.

The present disclosure uses a device and an instruction set forperforming the convolutional neural network forward operation, whichsolves the problem of the lack of CPU and GPU computation performance,and the problem of high front-end decoding overhead. The presentdisclosure effectively improves support for the forward operation of amulti-layer convolutional neural network.

By using a dedicated on-chip cache for the forward operation of amulti-layer convolutional neural network, input neurons and convolutionkernel data may be fully reused, which may avoid repeated reading ofthese data from the memory, reduce the memory access bandwidth, andprevent the memory bandwidth from becoming a performance bottleneck ofthe forward operation of a multi-layer artificial neural network.

Based on the above examples, FIG. 2 shows an information processingmethod according to an example of the present disclosure. The methodshown in FIG. 2 may include:

a step S102, obtaining a first image to be processed, where the firstimage has a resolution of a first-level size.

The first image may be a picture or a video frame image, and a count ofthe first image is not limited herein. In other words, the first imagemay be one or more pictures, or be a frame image of one or more segmentsof videos.

The method may further include a step S104, using, by the computationdevice, the first image as input of the operation unit to call theoperation instruction to perform resolution optimization on the firstimage, so as to obtain a second image, where

the second image has a resolution of a second-level size, thefirst-level size is smaller than the second-level size, and theoperation instruction is a preset instruction for optimizing an imageresolution.

The operation instruction includes, but is not limited to, a convolutionoperation instruction, a pooling instruction, a normalizationinstruction, a non-linear activation instruction, and the like. Fordetails, please refer to related descriptions in the above examples ofFIG. 1. Optionally, the process of calling related operationinstructions in the computation device (such as an operation unit) toperform resolution optimization on the target image will not be furtherdescribed herein. For details, please refer to the specific descriptionsof calling related instruction in the above examples of FIG. 1.

The first-level size and the second-level size are both used to describethe resolution of an image, and the first-level size is less than orequal to the second-level size. For example, the resolution size of thefirst image (the first-level size) is 800×600, and the resolution sizeof the second image (the second level size) may be 1024×768.

Some examples involved in the present disclosure are described below.

In the step S102, an input format of the first image may be an imageformat such as bmp, gif, jpeg, etc., or may be multi-dimensional matrixdata converted from pixels of the image.

In an optional example, the step S102 specifically includes: obtainingan original image to be processed input by a user and pre-processing theoriginal image to obtain the first image to be processed, where thepre-processing is an operation customized by the user side or theterminal side (the computation device side) and includes one or more ofthe following processing: translation, scaling transformation,non-linear transformation, normalization, format conversion, datadeduplication, processing of data exception, data missing filling, colorconversion, and image restoration.

In a specific implementation, the computation device obtains an originalimage to be processed input by a user. The description of the originalimage will not be further described herein. For details, please refer tothe related description of the first image. Further, the computationdevice may call a related operation instruction to performpre-processing, such as normalization, format conversion, colorconversion, etc., on the original image to obtain the first image to beprocessed. The pre-processing includes, but is not limited to, formatconversion (such as normalization processing and the like), colorconversion (such as converting into a gray-scale image), imagerestoration, image modification, and other processing. Correspondingly,the operation instruction may be an instruction related to thepre-processing. For instance, when the pre-processing is thenormalization processing, the corresponding operation instruction is anormalization instruction.

The pre-processing includes, but is not limited to, any one or more ofthe following: data format conversion (such as normalization, integerdata conversion, etc.), data deduplication, data exception processing,filling missing data, scaling, translation, and the like. For instance,the data format conversion may specifically be: conversion betweencontinuous data and discrete data; power conversion, which is to convertnon-power weight data in input data (a multi-dimensional matrix of thetarget image) of a neural network to power weight data; statistics offloating-point data, which is to count bits of exponent offset andexponent bits required for storing different types of data during aforward operation of the artificial neural network; and floating-pointdata conversion, which is to convert between a short-bit floating pointdata type and a long-bit floating point data type, and the like, whichis not limited in the present disclosure.

It should be understood that, when the preprocessing is a processingsuch as translation, scaling transformation, and non-linear operationconversion, the computation device converts the first image into imagepixel data that can be recognized by the device. Conversely, when asubsequent computation device performs similar pre-processing on theimage pixel data, it can be understood that the computation device canmap/convert the image pixel data into a corresponding image and outputthe image to the user for viewing.

The step S104 has the following two implementations.

As a specific implementation of S104, the computation device calls anoperation instruction to perform feature extraction on the first image,so as to directly obtain and output the second image.

As another implementation of S104, the computation device calls anoperation instruction to perform feature extraction on the first imageto obtain a feature image, then pre-processes the feature image toobtain the second image, where the pre-processing is an operation presetby a user side or a terminal side.

Firstly, some examples involved in the feature extraction are describedbelow. The purpose of the feature extraction in the present disclosureis to optimize the resolution of the first image so as to change it intoa super-resolution second image. In other words, the feature extractionin the present disclosure can be regarded as image resolutionoptimization.

Specifically, the computation device may call related instructions inthe operation unit to perform feature extraction on the first image toobtain a feature image. It should be understood that when an expressionform of the first image is a multi-dimensional matrix, the featureextraction performed on the first image is a process of datadimensionality reduction and resolution optimization, which may reducethe complexity of data processing to a certain extent, reduce thecomputation load of a computation device, and improve computationefficiency.

In an optional example, the operation instruction may be an instructionfor feature extraction. For details, please refer to relateddescriptions in the above examples.

In an optional example, the operation instruction may include any one ormore of the following instructions: a convolution operation instruction,a normalization instruction, a non-linear activation instruction, and apooling instruction. It should be noted that when there are a pluralityof the first operation instructions (which can also be called anoperation instruction set), an order, count, and calling thread of therespective first operation instructions called in the operationinstruction set may be customized by the user side or the computationdevice side (such as a terminal), which is not limited herein.

FIG. 3 shows a schematic diagram of calling an operation instructionbased on single-thread to perform feature extraction. Specifically, thecontroller unit may extract a convolution operation instruction from theregister unit and send the convolution operation instruction to theoperation unit to process the first image, thereby obtaining a firstintermediate image. Then the controller unit may fetch a normalizationinstruction from the register unit and send the normalizationinstruction to the operation unit to process the first intermediateimage, thereby obtaining a second intermediate image. The controllerunit may fetch a non-linear activation instruction from the registerunit and send the non-linear activation instruction to the operationunit to process the second intermediate image, thereby obtaining a thirdintermediate image. Then the controller unit may fetch a poolinginstruction from the register unit and send the pooling instruction tothe operation unit to process the third intermediate image, therebyobtaining a feature image after feature extraction.

Optionally, when each operation instruction shown in FIG. 3 is calledfor execution, the execution order may be changed; for instance, thenormalization instruction may be called before the convolution operationinstruction, which is not limited herein.

In an optional example, the present disclosure supports multi-thread(multiple pipelines) feature extraction processing. In other words, thefeature extraction in the present disclosure may be implemented bythreads splitting or merging. Implementations of thread splittinginclude, but are not limited to, data copying, data grouping, and thelike, while implementations of thread merging include, but are notlimited to, data addition and subtraction, data multiplication, datacombination and arrangement, and the like.

FIG. 4 shows a schematic diagram of calling an operation instructionbased on multiple threads to perform feature extraction. Specifically,the computation device may perform data operations of two threads at thesame time. The operation instructions to be used in each thread may bethe same or different, and an order and a count of calling the operationinstructions are not limited herein. As shown in FIG. 4, one of thethreads sequentially executes the operation instructions in FIG. 3 twiceat the same time, while the other thread sequentially executes theoperation instructions in FIG. 3 once.

It should be noted that when multi-thread feature extraction is involvedin the present disclosure, the feature image after feature extractionmay be obtained by aggregating result data processed by each thread. Inother words, the feature image data after the feature extraction mayinclude, but is not limited to, a plurality of pieces of matrix datawith the same dimension or different dimensions, which is not limitedherein.

Secondly, the examples involved in pre-processing the feature image areintroduced below. For details, please refer to the related descriptionof pre-processing the original image in the step S102. Thepre-processing includes, but is not limited to, translation, scaling,non-linear operation, and the like.

The examples of the present disclosure are briefly introduced belowcombined with the examples of FIGS. 1A-1H.

In the step S102, the computation device obtains a first image to beprocessed input by the user. In an optional example, the communicationunit may be the storage medium (the off-chip memory) shown in 1A or bean input/output (IO) unit, which is not limited herein.

In an optional example, the computation device may be the computationdevice shown in FIG. 1A or FIG. 1D. Specifically, the computation devicecan store various operation instructions in the register unit or theinstruction storage unit through the data access unit; further, thecomputation device can read/write and store various operationinstructions through the data access unit . The controller unit isconfigured to control the reading of various operation instructions fromthe register unit (or the instruction storage unit, etc.) and decode theoperation instruction into an executable operation instruction.Optionally, the controller unit may also send the operation instructionto the operation unit for execution. Specifically, related arithmeticunits can be called in turn for data processing according to thecomputation topology corresponding to the operation instruction. Theconvolution operation instruction is described in details below as aninstance. The interconnection module is configured to receive input data(the first image) and a computation topology, where the computationtopology is a topology corresponding to the operation instruction. Forinstance, when the operation instruction is a convolution operationinstruction, the corresponding computation topology may be: themultiplication arithmetic unit—the addition arithmetic unit—(optional)the activation arithmetic unit. Each type of arithmetic unit isconfigured to perform a corresponding computational function operation,for instance, the multiplication arithmetic unit is configured toperform a multiplication operation, etc., which will not be furtherdescribed in the present disclosure.

Other descriptions of the step S102 are similar to those of the aboveexamples, which will not be further described herein.

Correspondingly, specific implementations of the step S104 are describedbelow.

In a specific implementation, the computation device fetches acorresponding operation instruction from the register unit (or theinstruction storage unit) through the controller unit and the dataaccess unit, where the operation instruction is configured to processthe first image (which may specifically be resolution optimization). Forthe operation instruction, please refer to the related introduction inthe above examples; for instance, the instruction may be an operationinstruction for resolution optimization. The count of the operationinstructions is not limited herein.

Further, after the controller unit fetches the operation instruction,the controller unit sends the operation instruction to the operationunit to perform resolution optimization on the first image in theoperation unit according to the computation topology corresponding tothe operation instruction, so as to obtain the second image.

A specific implementation process of the step S104 is described indetail below with the operation instruction being a convolutionoperation instruction as an instance.

In a specific implementation, referring to the computation device shownin FIG. 1A, the computation device obtains a first image to be processedinput by a user through the communication unit (or a storage medium, oran off-chip memory). Optionally, the computation device may call arelated computation instruction to perform conversion of a preset formaton the first image , thereby obtaining image data which can beidentified and processed by the computation device, such as a matrix orvector composed of i pieces of x_(i) pixel data. The preset format iscustomized by the user side or the computation device side. Further, thecomputation device fetches a convolution operation instruction from theregister unit through the data access unit and the controller unit, andsends the convolution operation instruction to _(i)+b). W is convolutionkernel, and x_(i) is input data. Correspondingly, the computation devicecontrols the operation unit to execute the convolution operationinstruction on the input data x_(i) (the first image). Specifically, thecomputation device calls the multiplication arithmetic unit in theoperation unit to multiply a convolution kernel W by input data x₁,calls the addition arithmetic unit to find the sum, adds a bias b, andthen calls the activation arithmetic unit to perform an activationoperation s(h), so as to obtain a final output result S. The outputresult is the second image or intermediate data. When the output resultis intermediate data, according to a similar computation principle ofthe above convolution operation instruction, the computation device mayfurther call other operation instructions to process the intermediatedata. The process is repeated until the second image is obtained.

In another specific implementation, referring to the computation deviceshown in FIG. 1D, the process is similar to that of the above step S104and uses the computation device shown in 1D. The operation unit mayspecifically include a primary operation module, secondary operationmodules, and an interconnection module connecting the primary operationmodule and the secondary operation modules. The interconnection modulemay be configured to transfer data between the primary operation moduleand the secondary operation modules, receive a computation topologycorresponding to an operation instruction, etc. The computation devicemay control an implementation of a bias b operation and an activation S(h) operation in the convolution operation in the primary operationmodule, and control an implementation of a vector multiplicationoperation wx_(i) and an accumulation operation Σ in the respectivesecondary operation modules. Specifically, the computation device maytransfer input data xi (the first image) to each secondary operationmodule through the controller unit, so as to first call a multiplicationarithmetic unit in each secondary operation module to multiply aconvolution kernel W by the input data x_(i), and then call an additionarithmetic unit to sum and obtain an output scalar. Then theinterconnection module is configured to accumulate and splice outputscalars of the respective secondary operation modules stage by stageinto an intermediate vector and send the intermediate vector to theprimary operation module. Further, the computation device calls theaddition arithmetic unit in the primary operation module to spliceintermediate vectors corresponding to all input data into anintermediate result, adds a bias b to the intermediate result, and thencalls an activation arithmetic unit to perform an activation operations(h) to obtain a final output result S.

For the implementation of calling related operation instructions in thecomputation device to process the first image, please refer to relateddescriptions of the above FIGS. 1A to 1H. In other words, the examplesof the FIGS. 1A to 1H may also be correspondingly applied to theexamples of the information processing method described in FIG. 2, andwill not be further described herein. It should be understood that theconvolution operation instruction in the above description is only usedas an instance to illustrate the convolution operation instructioncalling and data processing, which is not a limitation; accordingly,when the operation instruction is another instruction instead of theconvolution operation instruction, a related processing method similarto that of the convolution operation instruction may also be used toimplement steps of the method examples of the present disclosure.

Based on the examples of the present disclosure, the resolution of animage may be improved. Compared with the prior art that uses ageneral-purpose processor and software for resolution optimization, thepresent disclosure may have technical effects of lower power consumptionand faster speed.

FIG. 5 is a structural diagram of a computation device (which may as aterminal) according to an example of the present disclosure. Thecomputation device shown in FIG. 5 includes a communication unit 617 andan operation unit 614, where

the communication unit 617 is configured to obtain a first image to beprocessed, where the first image has a resolution of a first-level size;

the operation unit 614 is configured to obtain and call an operationinstruction to perform resolution optimization on the first image toobtain a second image, where

the second image has a resolution of a second-level size, thefirst-level size is smaller than the second-level size, and theoperation instruction is a preset instruction for optimizing an imageresolution.

Optionally, the computation device further includes a storage medium 611(optional), a register unit 612, an interconnection module 613, acontroller unit 615, and a data access unit 616. For the above functionunits, please refer to related descriptions of the examples in FIG. 1.Optionally, the communication unit and the storage medium may be thesame or different. For instance, the communication unit may be a storagemedium or be an IO unit of the computation device, which is not limitedherein.

In an optional example,

the communication unit is configured to obtain an original image to beprocessed input by a user, where the original image has a resolution ofthe first-level size, and

the operation unit is configured to pre-process the original image toobtain the first image to be processed, where the pre-processing is anoperation preset by a user side or a terminal side.

In an optional example, the computation device further includes aregister unit 612 and a controller unit 615, where

the controller unit is configured to fetch an operation instruction fromthe register unit, and send the operation instruction to the operationunit;

the operation unit is configured to call the operation instruction toperform feature extraction on the first image to obtain a feature image;and

the operation unit is configured to pre-process the feature image toobtain the second image, where the pre-processing is an operation presetby a user side or a terminal side.

In an optional examples, the pre-processing includes one or more of thefollowing processing: translation, scaling transformation, non-lineartransformation, normalization, format conversion, data deduplication,processing of data exception, and data missing filling.

In an optional example,

the operation unit is configured to perform feature extraction on thefirst image based on an operation instruction set of at least one threadto obtain a feature image, wherein the operation instruction setincludes at least one of the operation instructions, and an order ofcalling each operation instruction in the operation instruction set iscustomized by a user side or a terminal side.

In an optional example, the computation device further includes a dataaccess unit and a storage medium,

the operation unit is configured to send the second image to the dataaccess unit and store the second image in the storage medium.

In an optional example, the operation unit includes a primary operationmodule and a plurality of secondary operation modules, where the primaryoperation module is interconnected with the plurality of secondaryoperation modules by an interconnection module, and when the operationinstruction is a convolution operation instruction,

the secondary operation modules are configured to implement aconvolution operation of input data and convolution kernels in aconvolutional neural network algorithm, where the input data is thefirst image and the convolutional neural network algorithm correspondsto the convolution operation instruction,

the interconnection module is configured to implement data transferbetween the primary operation module and the secondary operationmodules; before a forward operation of a neural network fully connectedlayer starts, the primary operation module sends the input data to eachsecondary operating through the interconnection module; and after thecomputation of the secondary operation modules is completed, theinterconnection module splices output scalars of the respectivesecondary operation modules stage by stage into an intermediate vectorand sends the intermediate vector back to the primary operation module,and

the primary operation module is configured to splice intermediatevectors corresponding to all input data into an intermediate result, andperform subsequent operations on the intermediate result, where

In an optional example,

the primary operation module is configured to add bias data to theintermediate result, and then perform an activation operation.

In an optional example, the primary operation module includes a firstoperation unit, where the first operation unit includes a vectoraddition unit and an activation unit,

the vector addition unit is configured to implement a bias operation ofa convolutional neural network operation and perform element-wiseaddition on bias data and the intermediate result to obtain a biasresult; and

the activation unit is configured to perform an activation functionoperation on the bias result.

In an optional example, the primary operation module includes a firststorage unit, a first operation unit, a first data dependencydetermination unit, and a first storage unit; where

the first storage unit is configured to cache input data and output dataused by the primary operation module during a computation process,wherein the output data includes the second image,

the first operation unit is configured to perform various operationalfunctions of the primary operation module;

The data dependency determination unit is configured to ensure thatthere is no consistency conflict in reading data from and writing datato the first storage unit, read an input neuron vector from the firststorage unit, and send the vector to the secondary operation modulesthrough the interconnection module; and

sending an intermediate result vector from the interconnection module tothe first operation unit.

In an optional example, the secondary operation modules include a secondoperation unit, where the second operation unit includes a vectormultiplication unit and an accumulation unit,

the secondary operation modules are configured to perform a convolutionoperation of input data and a convolution kernel in a convolution neuralnetwork algorithm, which includes:

the vector multiplication unit is configured to perform a vectormultiplication operation of a convolution operation, and

the accumulation unit is configured to perform an accumulation operationof the convolution operation.

In an optional example, each secondary operation module includes asecond operation unit, a second data dependency determination unit, asecond storage unit, and a third storage unit;

the second operation unit is configured to perform various arithmeticand logical operations of the secondary operation modules,

the second data dependency determination unit is configured to perform areading/writing operation on the second storage unit and the thirdstorage unit during a computation process to ensure that there is noconsistency conflict between the reading and writing operations on thesecond storage unit and the third storage unit,

the second storage unit is configured to cache input data and an outputscalar obtained from the computation performed by the secondaryoperation module, and

the third storage unit is configured to cache a convolution kernelrequired by the secondary operation module in the computation process.

In an optional example, the first data dependency or the second datadependency ensures that there is no consistency conflict in reading andwriting in the following manners: storage addresses corresponding todata/instructions stored in the corresponding storage unit do notoverlap; or determining whether there is dependency between a controlsignal that has not been executed and data of a control signal that isbeing executed; if there is no dependency, the control signal is allowedto be issued immediately; otherwise, the control signal is not allowedto be issued until all control signals on which the control signal isdependent have been executed; where

the computation device controls the controller unit to obtain anoperation instruction from the register unit and decode the operationinstruction into the control signal for controlling behavior of othermodules, where the other modules include the primary operation moduleand the plurality of of secondary operation modules.

In an optional example, the plurality of secondary operation modules areconfigured to compute respective output scalars in parallel byconfiguration using the same input data and respective convolutionkernels.

In an optional example, an activation function active used by theprimary operation module may be any of the following non-linearfunctions: sigmoid, tanh, relu, softmax, or may be a linear function.

In an optional example, the interconnection module forms a data channelfor continuous or discrete data between the primary operation module andthe plurality of secondary operation modules. The interconnection modulehas any of the following structures: a tree structure, a ring structure,a grid structure, a hierarchical interconnection, and a bus structure.

For those which are not shown or described in the present disclosure,please refer to related descriptions of the above examples.

An example of the present disclosure further provides a computer storagemedium on which a computer program is stored for electronic dataexchange. The computer program may cause a computer to perform part orall of the steps of any information processing method described in theforegoing method examples.

An example of the present disclosure further provides a computer programproduct, where the computer program product includes a non-transitorycomputer-readable storage medium on which a computer program is stored.The computer program may cause a computer to perform part or all of thesteps of any information processing method described in the foregoingmethod examples.

An example of the present disclosure also provides an accelerationdevice which includes: a memory which stores executable instructions,and a processor configured to execute the executable instructions in thestorage unit according to the information processing method.

The processing unit may be a single processing unit, or may include twoor more processing units. In addition, the processor may also include ageneral-purpose processor (CPU) or a graphics processing unit (GPU), afield programmable gate array (FPGA), or an application-specificintegrated circuit (ASIC) to set up and operate a neural network. Theprocessor may also include an on-chip memory for caching (including amemory in the processing device).

In some examples, the present disclosure provides a chip which includesthe above neural network processor configured to execute the informationprocessing method.

In some examples, the present disclosure provides a chip packagestructure which includes the above chip.

In some examples, the present disclosure provides a board card whichincludes the above chip package structure.

In some examples, the present disclosure provides an electronic devicewhich includes the above board card.

The electronic device may include a data processing device, a robot, acomputer, a printer, a scanner, a tablet, a smart terminal, a mobilephone, a traffic recorder, a navigator, a sensor, a webcam, a server, acloud-based server, a camera, a video camera, a projector, a watch, aheadphone, a mobile storage, a wearable device, a vehicle, a householdappliance, and/or a medical equipment.

The vehicle may include an airplane, a ship, and/or a car. The householdelectrical appliance may include a television, an air conditioner, amicrowave oven, a refrigerator, an electric rice cooker, a humidifier, awashing machine, an electric lamp, a gas cooker, and a range hood. Themedical equipment may include a nuclear magnetic resonance spectrometer,a B-ultrasonic scanner, and/or an electrocardiograph.

It should be noted that, the foregoing examples of method, for the sakeof conciseness, are all described as a series of action combinations,but those skilled in the art should know that since according to thepresent disclosure, the steps may be performed in a different order orsimultaneously, the disclosure is not limited by the described order ofaction. Secondly, those skilled in the art should also understand thatthe examples described in the specification are all optional, and theactions and modules involved are not necessarily required for thisdisclosure.

In the examples above, the description of each example has its ownemphasis. For a part that is not described in detail in one example,reference may be made to related descriptions in other examples.

It should be understood that in the examples provided by the presentdisclosure, the disclosed device may be implemented in another manner.For instance, the examples above are merely illustrative. For instance,the division of the units is only a logical function division. In acertain implementation, there may be another manner for division. Forinstance, a plurality of units or components may be combined or may beintegrated in another system, or some features can be ignored or notperformed. In addition, the displayed or discussed mutual coupling ordirect coupling or communication connection may be implemented throughindirect coupling or communication connection of some interfaces,devices or units, and may be electrical or other forms.

The units described as separate components may or may not be physicallyseparated. The components shown as units may or may not be physicalunits. In other words, the components may be located in one place, ormay be distributed to a plurality of network units. According to certainneeds, some or all of the units can be selected for realizing thepurposes of the examples of the present disclosure.

In addition, the functional units in each example of the presentapplication may be integrated into one processing unit, or each of theunits may exist separately and physically, or two or more units may beintegrated into one unit. The integrated units above may be implementedin the form of hardware or in the form of software program modules.

When the integrated units are implemented in the form of a softwareprogram module and sold or used as an independent product, they may bestored in a computer-readable memory. Based on such understanding, theessence of the technical solutions of the present disclosure, or a partof the present disclosure that contributes to the prior art, or all orpart of technical solutions, can all or partly embodied in the form of asoftware product that is stored in a memory. The software productincludes several instructions to enable a computer device (which may bea personal computer, a server, or a network device, etc.) to perform allor part of the steps of the methods described in the examples of thepresent disclosure. The foregoing memory includes: a USB flash drive, aread-only memory (ROM), a random access memory (RAM), a mobile harddisk, a magnetic disk, or an optical disc, and other media that canstore program codes.

A person of ordinary skill in the art may understand that all or part ofthe steps of the foregoing examples of method may be completed by aprogram instructing related hardware. The program may be stored in acomputer-readable memory, and the memory may include a flash disk, aread-only memory (ROM), a random access memory (RAM), a magnetic disk,an optical disk, or the like.

The examples of the present disclosure have been described in detailabove. Specific examples have been used in the specification to explainthe principles and implementation manners of the present disclosure. Thedescriptions of the above examples are only used to facilitateunderstanding of the methods and core ideas of the present disclosure.Persons of ordinary skill in the art may change the implementation andapplication scope according to the ideas of the present application. Insummary, the content of this specification should not be construed as alimitation on the present disclosure.

What is claimed is:
 1. An information processing method applied to acomputation circuit, wherein the computation circuit includes acommunication circuit and operation circuit, and the method comprises:controlling, by the computation circuit, the communication circuit toobtain a first image to be processed, wherein the first image has aresolution of a first-level size; controlling, by the computationcircuit, the operation circuit to obtain and execute an operationinstruction to perform resolution optimization on the first image toobtain a second image, wherein the second image has a resolution of asecond-level size, the first-level size is smaller than the second-levelsize, and the operation instruction is a preset instruction foroptimizing an image resolution
 2. The method of claim 1, wherein thecontrolling, by the computation circuit, the communication circuit toobtain a first image to be processed includes: controlling, by thecomputation clrcuit, the communication circuit to obtain an originalimage to be processed input by a user, wherein the original image has aresolution of the first-level size, and controlling, by the computationcircuit, the operation circuit to pre-process the original image toobtain the first image to be processed, wherein the pre-processing is anoperation preset by a user side or a terminal side.
 3. The method ofclaim 1, wherein the computation circuit further includes a registercircuit and a controller circuit, and the controlling, by thecomputation circuit, the operation circuit to obtain and call anoperation instruction to perform resolution optimization on the firstimage, so as to obtain the second image includes: controlling, by thecomputation circuit, the controller circuit to fetch an operationinstruction from the register circuit, and sending, by the computationcircuit, the operation instruction to the operation circuit.controlling, by the computation circuit, the operation circuit to callthe operation instruction to perform feature extraction on the firstimage to obtain a feature image, and controlling, by the computationcircuit, the operation circuit to pre-process the feature image toobtain the second image, wherein the pre-processing is an operationpreset by a user side or a terminal side.
 4. The method of claim 3,wherein the pre-processing includes one or more of the followingprocessing: translation, scaling transformation, non-lineartransformation, normalization, format conversion, data deduplication,processing of data exception, and data missing filling.
 5. The method ofclaim 3, wherein the calling the operation instruction to performfeature extraction on the first image to obtain a feature imageincludes: controlling, by the computation circuit, the operation circuitto perform feature extraction on the first image based on an operationinstruction set of at least one thread to obtain a feature image,wherein the operation instruction set includes at least one operationinstruction, and an order of calling the operation instruction in theoperation instruction set is customized by a user side or a terminalside.
 6. The method of claim 1, wherein the computation circuit furtherincludes a data access circuit and a storage medium, and the computationcircuit controls the operation circuit to send the second image to thedata access circuit and store the second image in the storage medium. 7.The method of claim 1, wherein the operation circuit includes a primaryoperation module and a plurality of secondary operation modules, whereinthe primary operation module is interconnected with the plurality ofsecondary operation modules by an interconnection module, and when theoperation instruction is a convolution operation instruction, thecalling the operation instruction to perform resolution optimization onthe first image includes: controlling, by the computation circuit, thesecondary operation modules to implement a convolution operation ofinput data and a convolution kernel in a convolutional neural networkalgorithm, wherein the input data is the first image and theconvolutional neural network algorithm corresponds to the convolutionoperation instruction, controlling, by the computation circuit, theinterconnection module to implement data transfer between the primaryoperation module and the secondary operation modules, before a forwardoperation of a neural network fully connected layer starts,transferring, by the primary operation module, the input data to eachsecondary operation module through the interconnection module, and afterthe computation of the secondary operation modules is completed,splicing, by the interconnection module, output scalars of therespective secondary operation modules stage by stage to obtain anintermediate vector, and sending the intermediate vector back to theprimary operation module, and controlling, by the computation circuit,the primary operation module to splice intermediate vectorscorresponding to all input data into an intermediate result forsubsequent operations.
 8. The method of claim 7, wherein the performingsubsequent operations on the intermediate result includes: controlling,by the computation circuit, the primary operation module to add biasdata to the intermediate result, and then performing an activationoperation.
 9. The method of claim 8, wherein the primary operationmodule includes a first operation circuit, wherein the first operationcircuit includes a vector addition circuit and an activation circuit,the controlling, by the computation circuit, the primary operationmodule to add bias data to the intermediate result, and then performingan activation operation include: controlling, by the computationcircuit, the vector addition circuit to implement a bias additionoperation of a convolutional neural network operation and performelement-wise addition on bias data and the intermediate result to obtaina bias result, and controlling, by the computation circuit, theactivation circuit to perform an activation function operation on thebias result.
 10. (canceled)
 11. The method of claim 7, wherein eachsecondary operation module includes a second operation circuit, whereinthe second operation circuit includes a vector multiplication circuitand an accumulation circuit. the controlling, by the computationcircuit, the secondary operation modules to perform a convolutionoperation of input data and a convolution kernel in a convolutionalneural network algorithm includes: controlling, by the computationcircuit, the vector multiplication circuit to perform a vectormultiplication operation of the convolution operation, and controlling,by the computation circuit, the accumulation circuit to perform anaccumulation operation of the convolution operation.
 12. (canceled) 13.(canceled)
 14. The method of claim 7, wherein the computation circuitcontrols the plurality of secondary operation modules to computerespective output scalars in parallel by using the same input data andrespective convolution kernels.
 15. A computation circuit, comprising acommunication circuit and operation circuit, wherein the communicationcircuit is configured to obtain a first image to be processed, whereinthe first image has a resolution of a first-level size; the operationcircuit is configured to obtain and call an operation instruction toperform resolution optimization on the first image to obtain a secondimage, wherein the second image has a resolution of a second-level size,the first-level size is smaller than the second-level size, and theoperation instruction is a preset instruction for optimizing an imageresolution
 16. The computation circuit of claim 15, wherein thecommunication circuit is configured to obtain an original image to beprocessed input by a user, wherein the original image has a resolutionof the first-level size, and the operation circuit is configured topre-process the original image to obtain the first image to beprocessed, wherein the pre-processing is an operation preset by a userside or a terminal side.
 17. The computation circuit of claim 15,further comprising a register circuit and a controller circuit whereinthe controller circuit is configured to fetch an operation instructionfrom the register circuit, and send the operation instruction to theoperation circuit. the operation circuit is configured to call theoperation instruction to perform feature extraction on the first imageto obtain a feature image, and the operation circuit is configured topre-process the feature image to obtain the second image, wherein thepre-processing includes one or more of the following processing:translation, scaling transformation, non-linear transformation,normalization, format conversion, data deduplication, processing of dataexception, and data missing filling.
 18. The computation circuit ofclaim 17, wherein the operation circuit is configured to perform featureextraction on the first image based on an operation instruction set ofat least one thread to obtain a feature image, wherein the operationinstruction set includes at least one operation instruction, and anorder of calling the operation instruction in the operation instructionset is customized by a user side or a terminal side.
 19. The computationcircuit of claim 15, wherein the operation circuit includes a primaryoperation module and a plurality of secondary operation modules, whereinthe primary operation module is interconnected with the plurality ofsecondary operation modules by an interconnection module, and when theoperation instruction is a convolution operation instruction, thesecondary operation modules are configured to implement a convolutionoperation of input data and a convolution kernel in a convolutionalneural network algorithm, wherein the input data is the first image andthe convolutional neural network algorithm corresponds to theconvolution operation instruction, the interconnection module isconfigured to implement data transfer between the primary operationmodule and the secondary operation modules, before a forward operationof a neural network fully connected layer starts, the primary operationmodule sends the input data to each secondary operation module throughthe interconnection module, and after the computation of the secondaryoperation modules is completed, the interconnection module splicesoutput scalars of the respective secondary operation modules stage bystage into an intermediate vector and sends the intermediate vector backto the primary operation module, and the primary operation module isconfigured to splice intermediate vectors corresponding to all inputdata into an intermediate result, and perform subsequent operations onthe intermediate result, wherein the primary operation module isconfigured to add bias data to the intermediate result, and then performan activation operation.
 20. (canceled)
 21. (canceled)
 22. (canceled)23. The computation circuit of claim 19, wherein the plurality ofsecondary modules use the same input data and respective convolutionkernels to compute respective output scalars in parallel.
 24. (canceled)25. (canceled)
 26. (canceled)